Gyu Sang Choi Yeungnam University

## PRAM and NAND Flash Memory, and B+Tree in PRAM



NAND Flash Memory **PRAM** Hybrid Storage of PRAM and NAND Flash Memory PRAM Translation Layer (PTL) B+Tree in PRAM Conclusions and Future Work

Outline



## **NAND Flash Memory**

Erase-before-write
Block and Page
Out-of-place update
Asymmetry between read and write latency
Up to 10<sup>5</sup> erase numbers
The page size increases as time goes

| NAND Flash Type        | SLC     | MLC     |
|------------------------|---------|---------|
| 4KB Page Read Latency  | ~25 us  | ~60 us  |
| 4KB Page Write Latency | ~200 us | ~800 us |
| Block Erase Latency    | ~1.5 ms | ~1.5 ms |

\* K9XXG08XXA and K9F8G08UXM Data Sheet



In-place update Byte-addressable ■ 10<sup>6</sup> write numbers No erase operation Asymmetry between read and write latency Capacity increases

PRAM\*

| PRAM                  | Latency |
|-----------------------|---------|
| 64Bytes Read Latency  | ~50ns   |
| 64Bytes Write Latency | ~1us    |

Shimin chen, Phillip B. Gibbons and Suman Nath, Rethinking Database Algorithms for Phase Change Memory, CIDR 2011
 PRAM, PCM, PCRAM are interchangeably used, and PRAM will be only used in this presentation

## Hybrid Storage of PRAM and NAN Storage of PRAM and NAN **Flash Memory**

#### Prior works

- (PRAM and Flash File System) PFFS\* had been proposed (PFFS) to store meta-data into PRAM
  - The meta-data are more frequently updated compared to user data
  - Use a complex wear-leveling mechanism by cold and hot area swapping
- Kim et al<sup>#</sup> proposed hybrid storage of PRAM and NAND Flash memory which stores meta-data of both file system and NAND Flash memory into PRAM
  - Reduce the write numbers using word-level comparison
  - No result on PRAM's life span

#### Motivation

Solve PRAM's endurance perfectly

<sup>#</sup>Kim et al., A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems, in EMSOFT '08, 2008, pp. 31– 5

## Hybrid Storage of PRAM and NAN (\* 영남대학교) Flash Memory

### The Proposed Scheme



**Internal Architecture of Flash-based SSD** 



 $\mathbf{X}$ 

Internal Architecture of the Proposed Hybrid SSD OS'12



Meta-date Management in the Proposed Scheme

## Hybrid Storage of PRAM and NAN <sup>Storage</sup> University Flash Memory

#### The Proposed Scheme



 $\rightarrow$ 

## Hybrid Storage of PRAM and NAN 응 않보대학교 Flash Memory

#### Experimental Setup

#### Modify FlashSim\* to support PRAM

| Work-   | Trace     | Average | Average  | Read    |
|---------|-----------|---------|----------|---------|
| load    | Duration  | Number  | Number   | Percen- |
|         | (Minutes) | of I/O  | of Mbits | tage(%) |
|         |           | per sec | per sec  |         |
| Trace1  | 145       | 11.76   | 1.74     | 59.93   |
| Trace2  | 62        | 11.29   | 3.00     | 52.92   |
| Trace3  | 75        | 21.35   | 3.39     | 43.44   |
| Trace4  | 113       | 7.5     | 1.60     | 52.32   |
| Trace5  | 143       | 10.06   | 1.82     | 44.52   |
| Trace6  | 155       | 4.58    | 1.39     | 52.63   |
| Trace7  | 44        | 8.30    | 3.48     | 33.99   |
| Overall | 737       | 9.99    | 2.04     | 49.81   |

| Work-      | Trace     | Average | Average  | Read    |
|------------|-----------|---------|----------|---------|
| load       | Duration  | Number  | Number   | Percen- |
|            | (Minutes) | of I/O  | of Mbits | tage(%) |
|            |           | per sec | per sec  |         |
| OLTP1      | 728       | 123.74  | 3.22     | 28.07   |
| OLTP2      | 683       | 90.24   | 1.68     | 82.63   |
| WebSearch1 | 52        | 326.86  | 35.59    | 99.97   |
| WebSearch2 | 256       | 328.96  | 38.47    | 99.98   |

| NAND Flash Memory Organization |             |  |  |
|--------------------------------|-------------|--|--|
| Characteristics                | Description |  |  |
| Block                          | 64 Pages    |  |  |
| Page                           | 2 Kbytes    |  |  |
| Capacity                       | 32 Gbytes   |  |  |
| Page Read Latency              | 25 us       |  |  |
| Page Write Latency             | 200 us      |  |  |
| Block Erase Latency            | 1.5 ms      |  |  |

| PRAM Organization    |             |  |  |
|----------------------|-------------|--|--|
| Characteristics      | Description |  |  |
| Max Read Throughput  | 266MB/sec   |  |  |
| Max Write Throughput | 9MB/sec     |  |  |
| Capacity             | 1Gbits      |  |  |

\*Y. Kim, B. Tauras, A. Gupta, B. Urgaonkar, Flashsim: A simulator for nand flash-based solid-state drives, Proceedings of International Conference on Advances in System Simulation, 2009, pp. 125–131

## Hybrid Storage of PRAM and NAN Flash Memory

### Results



PC Workloads



| NAND Flash  | Page Size (Bytes) |      |     |     |
|-------------|-------------------|------|-----|-----|
| Memory Size |                   |      |     |     |
| (Bytes)     | 2K                | 4K   | 8K  | 16K |
| 8G          | 16M               | 8M   | 4M  | 2M  |
| 16G         | 32M               | 16M  | 8M  | 4M  |
| 32G         | 64M               | 32M  | 16M | 8M  |
| 64G         | 128M              | 64M  | 32M | 16M |
| 128G        | 256M              | 128M | 64M | 32M |

The Required PRAM Size to Manage Meta-data of NAND Flash Memory



#### Motivation

- Use PRAM as main memory or storage
- Need to solve PRAM's endurance problem
  - The prior proposed wear-leveling mechanisms are mainly related to bit-level write reduction

Propose a new wear-level mechanism





#### The Proposed Scheme

| Physical Page<br>Number | Logical Page<br>Number |
|-------------------------|------------------------|
| 0                       | 5 → N                  |
| 1                       | 5'                     |
| 2                       | 80                     |
| 3                       | 0                      |
| 4                       | 300                    |
| 5                       | 200                    |
| 6                       | 2                      |
| 7                       | 4                      |
|                         |                        |

| 124                     | 110    |  |  |
|-------------------------|--------|--|--|
| 125                     | 90     |  |  |
| 126                     | 82     |  |  |
| 127                     | 127 34 |  |  |
| Map Table from PA to LA |        |  |  |

. . . .

Meta-data Area in PRAM

|    | ogical Page<br>umber | Physical Page<br>Number | Physical<br>Page Nu |
|----|----------------------|-------------------------|---------------------|
| Ы  | 0                    | 3                       | 1                   |
|    | 1                    | 9                       | <b>o</b> /-         |
|    | 2                    | 6                       | ·>                  |
|    | 3                    | 8                       |                     |
|    | 4                    | 0                       | 🕴                   |
| L) | 5                    | $0 \rightarrow 1$       |                     |
|    | 6                    | 10                      |                     |
|    | 7                    | 4                       |                     |
|    | • •                  | ••                      |                     |
|    | 124                  | 30                      |                     |
|    | 125                  | 15                      |                     |
|    | 126                  | 12                      | N-1                 |
|    | 127                  | 11                      | L S                 |
|    | Man Table 6          |                         |                     |

Map lable from LA to PA

DRAM



mber

**Physical** Address

#### **User-data Area in PRAM**

**Logical Page Address** 

5

 $\mathbf{X}$ 

 $\mathbf{X}$ 



### Experimental Setup

 $\mathbf{X}$ 

Modify FlashSim\* to support PRAM

| Work-   | Trace     | Average | Average  | Read    |
|---------|-----------|---------|----------|---------|
| load    | Duration  | Number  | Number   | Percen- |
|         | (Minutes) | of I/O  | of Mbits | tage(%) |
|         |           | per sec | per sec  |         |
| Trace1  | 145       | 11.76   | 1.74     | 59.93   |
| Trace2  | 62        | 11.29   | 3.00     | 52.92   |
| Trace3  | 75        | 21.35   | 3.39     | 43.44   |
| Trace4  | 113       | 7.5     | 1.60     | 52.32   |
| Trace5  | 143       | 10.06   | 1.82     | 44.52   |
| Trace6  | 155       | 4.58    | 1.39     | 52.63   |
| Trace7  | 44        | 8.30    | 3.48     | 33.99   |
| Overall | 737       | 9.99    | 2.04     | 49.81   |

| Work-      | Trace     | Average | Average  | Read    |
|------------|-----------|---------|----------|---------|
| load       | Duration  | Number  | Number   | Percen- |
|            | (Minutes) | of I/O  | of Mbits | tage(%) |
|            |           | per sec | per sec  |         |
| OLTP1      | 728       | 123.74  | 3.22     | 28.07   |
| OLTP2      | 683       | 90.24   | 1.68     | 82.63   |
| WebSearch1 | 52        | 326.86  | 35.59    | 99.97   |
| WebSearch2 | 256       | 328.96  | 38.47    | 99.98   |

\*Y. Kim, B. Tauras, A. Gupta, B. Urgaonkar, Flashsim: A simulator for nand flash-based solid-state drives, Proceedings of International Conference on Advances in System Simulation, 2009, pp. 125–131



### Experimental Setup

| Characteristics    | PRAM1     | PRAM2     | PRAM3     |  |
|--------------------|-----------|-----------|-----------|--|
| Capacity           | 32 GBytes | 32 GBytes | 32 GBytes |  |
| Page               | 2 KBytes  | 2 KBytes  | 2 KBytes  |  |
| Word Read Latency  | 120 ns    | 14 ns     | 80 ns     |  |
| Word Write Latency | 1.12 µs   | 424 ns    | 10 µs     |  |

| Characteristics     | Description |
|---------------------|-------------|
| Block               | 64 Pages    |
| Page                | 2 KBytes    |
| Capacity            | 32 GBytes   |
| Page Read Latency   | 25 µs       |
| Page Write Latency  | 200 µs      |
| Block Erase Latency | 1.5 ms      |
|                     |             |

NVRAMOS'12

 $\mathbf{X}$ 



| PRAM    | Page Size (Bytes) |      |      |     |     |  |
|---------|-------------------|------|------|-----|-----|--|
| Size    |                   |      |      |     |     |  |
| (Bytes) | 512               | 1K   | 2K   | 4K  | 8K  |  |
| 1G      | 16M               | 8M   | 4M   | 2M  | 1M  |  |
| 2G      | 32M               | 16M  | 8M   | 4M  | 2M  |  |
| 4G      | 64M               | 32M  | 16M  | 8M  | 4M  |  |
| 8G      | 128M              | 64M  | 32M  | 16M | 8M  |  |
| 16G     | 256M              | 128M | 64M  | 32M | 16M |  |
| 32G     | 512M              | 256M | 128M | 64M | 32M |  |

The Required PRAM Size to Manage Meta-data of NAND Flash Memory

NVRAMOS'12

¥



## **B+Tree in PRAM**

#### B+Tree

- All paths from root to leaf are of the same length
- Each node has between  $\lceil n/2 \rceil$  and <u>n</u> pointers. Each leaf node stores between  $\lceil (n-1)/2 \rceil$  and <u>n-1</u> values.
- *n* is called fanout (it corresponds to the maximum number of pointers/children). The value (*n-1*)/2 is called order (it corresponds to the minimum number of values).

#### • Special cases:

- If the root is not a leaf, it has at least 2 children.
- If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n-1) values.





## **B+Tree in PRAM**

### Prior Works

 $\rightarrow$ 

#### B+Tree with unsorted leaf nodes\*



- The Proposed Scheme
  - Divide a node into Area1 and Area2
  - Area2 is much smaller than Area1
  - A new key/record is first inserted into Area2
  - If Area2 is full, merge Area1 and Area2 into Area1
  - Search Both Area1 and Area2



#### The Proposed Scheme: Insert Operation

Insert 40, 80, 60, 30, 70, 50, 35, 45, 55 and 65 sequentially



 $\rightarrow$ 



### The Proposed Scheme: Split Operation

#### Merge Area1 and Area2

| _  |    | 1  | 4  |    |         |    |    |    |    |     |    |    | ¥   |
|----|----|----|----|----|---------|----|----|----|----|-----|----|----|-----|
| 40 |    |    |    |    |         |    |    |    |    |     |    |    | 40  |
| 13 | 30 | 35 | 40 | 45 | 50      | 55 | bU | 65 | 70 | 80  |    |    | 10  |
|    | 1  |    |    |    | Allinin |    |    |    |    | 1.0 | 11 | 10 | 1.2 |
| U  | A  | 2  | 3  | 4  | 5       | 6_ |    | ŏ  | 9  | 10  |    | 12 | 13  |

#### Insert 75



# 2 3 4 5 6 7 8 9 10 11 12 13 V 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1</

9

8

0 1 2 3 4 5 6 7

NVRAMOS'12

 $\mathbf{X}$ 







### Experimental Setup

| CPU               | Intel Core2 Duo 2.4GHz           |
|-------------------|----------------------------------|
| L2 Cache Size     | 4Mbytes                          |
| Front-side Bus    | 1.066 GHz                        |
| Main Memory       | 3Gbytes                          |
| Storage Interface | Serial ATA                       |
| HDD               | Seagate Barracuda 7200.10        |
|                   | ST3250820AS, 250Gbytes           |
| SSD               | Mtron MSD-SATA3525, SLC 32Gbytes |
| OS                | Linux 2.6.34                     |
|                   |                                  |

| DRAM               |             | PCM                |             |  |
|--------------------|-------------|--------------------|-------------|--|
| Characteristics    | Description | Characteristics    | Description |  |
| Word Read Latency  | 3.5 ns      | Word Read Latency  | 5 ns        |  |
| Word Write Latency | 3.5 ns      | Word Write Latency | 62.5 ns     |  |

| NAND Flash m                | emory         | HDD                  |             |  |
|-----------------------------|---------------|----------------------|-------------|--|
| Characteristics Description |               | Characteristics      | Description |  |
| Page Read Latency           | $25 \ \mu s$  | Sector Read Latency  | 12.7 ms     |  |
| Page Write Latency          | $200 \ \mu s$ | Sector Write Latency | 12.7 ms     |  |

 $\mathbf{\mathbf{x}}$ 

\*

#### The Proposed Scheme: Area2's size and Endurance



#### Write Numbers per Node





#### Write Number per Record within a Node

The Comparison of Write Numbers by Varying the Size of Area2

영남대학교 Yeungnam University





## **B+Tree in PRAM**

NVRAMOS'12

#### Results: Insert and Retrieve Operations







## **B+Tree in PRAM**

### Results: Endurance

 $\mathbf{X}$ 





## **Conclusions and Future Works**

- The proposed hybrid storage of PRAM and NAND solve the PRAM's endurance problem perfectly
- PTL could be used in a storage device and even a memory controller
- P+TREE shows better performance and endurance compared to the original scheme
- Will propose new data structures for PRAM
   Evaluate the proposed schemes in real PRAM board.

NVRAMOS'12

# THANKS! QUESTIONS?